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Amendments to the Claims: 

This listing of claims will replace all prior versions, and listings, of claims in the application: 

fating of Claims; 

I (Currently amended) A data processing machine implemented method of selecting data sets for 
use with a predictive algorithm based on data network geographical information, comprising data 
processing machine Implemented steps of: 

generating a first statistical distribution of a training data set; 

generating a second statistical distribution of a testing data set; 

comparing using the first statistical distribution and the second statistical distribution to identify a 
discrepancy between the first statistical distribution and the second statistical distribution with respect to 
the data network geographical information hv comparing at least one of the first statistical di$tf ibutioq 
pud the second statistical distribution to a statistical di stri bution o f ft customer datflfrfrse to determine if at 
least one of the Wining data set and the testin g data set are geographically representative of a customer 
population represented bv the custo mer database: 

modifying selection of entries in one or more of the training data set and the testing data set based 
on the discrepancy between the first statistical distribution and the second statistical distribution; and 

using the modified selection of entries by the predictive algorithm. 

2. (Previously presented) The method of claim 1 , wherein the first statistical distribution and the 
second statistical distribution are distributions of a number of data network links from a customer data 
network geographical location to a web site data network geographical location. 

3. (Previously presented) The method of claim 1, wherein the first statistical distribution and the 
second statistical distribution are distributions of a size of a click stream for arriving at a web site data 
network geographical location, 

4. (Previously presented) The method of claim 1 , wherein comparing the first statistical distribution 
and the second statistical distribution includes comparing one or more of a mean, mode, and standard 
deviation of the first statistical distribution to one or more of a mean, mode, and standard deviation of the 
second statistical distribution* 
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5. (Previously presented) The method of claim U wherein the first statistical distribution and the 
second statistical distribution are distributions of a weighted data network geographical 

distance between a customer data network geographical location and a web site data network 
geographical locations. 

6. (Previously presented) The method of claim U wherein the first statistical distribution and the 
second statistical distribution are distributions of a weighted click stream for arriving at a web site data 
network geographical locations. 

7 . (Previously presented) The method of claim i , wherein modifying selection of entries in one or 
more of the training data set and the testing data set includes generating recommendations for improving 
selection of entries in one or more of the training data set and the testing data set, and wherein the method 
of claim 1 further comprises re-generating at least one of the first statistical distribution and the second 
statistical distribution based upon the recommendations, 

8. (Previously presented) The method of claim 1 , wherein the training data set and the testing data 
set are selected from a customer information database comprising information with respect to customers 
who have purchased any of goods and services over a data network, wherein the data network geographic 
information pertains to geographic information of the data network. 

9. (Cancelled) 

10. (Previously presented) The method of claim 1, wherein the first statistical distribution and 
second statistical distribution are frequency distributions of number of data network links between a 
customer geographical location and one or more web site data network geographical locations, and size of 
a click stream for arriving at one or more web site data network geographical locations. 

1 1 . (Currently amended) The method of claim [[91] L wherein comparing at least one of the first 
statistical distribution and the second statistical distribution to a statistical distribution of a customer 
database includes; 

generating a composite data set from the training data set and the testing data set; and 
generating a composite statistical distribution from the composite data set that was generated 
from the training data set and the testing data set. 

Page 3 of 18 
Bufiche-09>879,4Q1 



PAGE 5/20 * RCVD AT 6/12/2006 4:34:07 PM [Eastern Daylight Time] * SVR:USPTO-EFXRF-5/22 * DNIS:2738300 * CSID:972 385 7766 * DURATION (mm-ss):06-16 



Jun'12 2006 3:35PM YEE 8< ASSOCIATES, F«C. 



072J 385-7766 



G 



1 2. (Previously presented) The method of claim 1 , wherein modifying selection of entries in one or 
more of the training data set and the testing data set includes changing one of a random selection 
algorithm and a seed value for the random selection algorithm, and then re-comparing the first statistical 
distribution and the second statistical distribution. 

13. (Previously presented) The method of claim 1, wherein using the modified selection of entries by 
the predictive algorithm includes training the predictive algorithm using at least one of the training data 
set. and the testing data set if the discrepancy is within a predetermined tolerance, 

14. (Original) The method of claim 1 3, wherein the predictive algorithm is a discovery based data 
mining algorithm. 

1 5. (Currently amended) An apparatus for selecting data sets for use with a predictive algorithm 
based on data network geographical information, comprising: 

a statistical engine; 

a comparison engine coupled to the statistical engine, wherein the statistical engine generates a 
first statistical distribution of a training data set and a second distribution of a testing data set, the 
comparison engine eeBtparo yiges the first statistical distribution and the second distribution to identify a 
discrepancy between the first statistical distribution and the second distribution with respect to Jhe data 
network geographical information hv comparing at least one o f the first statistical distribution and the 
second statical distribution to a statistical distrib ution of a customer database to determine if at least one 
of the training data set and the testing dat a set are geographically representative of & customer population 
represented by the customer database , modifies selection of entries in one or more of the training data set 
and the testing data set based on the discrepancy between the .first statistical distribution and the second 
distribution, and provides the modified selection of entries for use by the predictive algorithm; and 

a predictive algorithm device that uses the modified selection of entries and the predictive 
algorithm. 

16. (Previously presented) The apparatus of claim 15, wherein the first statistical distribution and the 
second statistical distribution are distributions of a number of data network links from a customer data 
network geographical location to a web site data network geographical location. 
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17. (Previously presented) The apparatus of claim 15. wherein the first statistical distribution and the 
second statistical distribution are distributions of a size of a click stream to arrive at a web site data 
network geographical location. 

18. (Previously presented) The apparatus of claim 1 5, wherein the comparison engine compares the 
first statistical distribution and the second statistical distribution by comparing one or more of a mean, 
mode, and standard deviation of the first statistical distribution to one or more of a mean, mode, and 
standard deviation of the second statistical distribution. 

19. (Previously presented) The apparatus of claim 15. wherein the first statistical distribution and the 
second statistical distribution are distributions of a weighted number of data network links between a 
customer data network geographical location and a web site data network geographical location. 

20- (Previously presented) The apparatus of claim 1 5, wherein the first statistical distribution and the 
second statistical distribution are distributions of a weighted size of a click stream to arrive at a web site 
data network geographical location. 

21 . (Previously presented) The apparatus of claim 15, wherein the comparison engine modifies 
selection of entries in one or more of the training data set and the testing data set by generating 
recommendations for improving selection of entries in one or more of the training data set and the testing 
data set, and wherein the statistical engine re-generates at least one of the first statistical distribution and 
the second statistical distribution based upon the recommendations. 

22. (Previously presented) The apparatus of claim 1 5, further comprising a training data set/testing 
data set selection device that selects the training data set and the testing data set froni a customer 
information database comprising information with respect to customers who have purchased any of goods 
and services over a data network, wherein the data network geographic information pertains to geographic 
information of the data network. 

23. (Cancelled) 

24. (Previously presented) The apparatus of claim 15, wherein the first statistical distribution and 
second statistical distribution are frequency distributions of a number of data network links between a 
customer data network geographical location and one or more web site data network geographical 
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locations, and a size of a click stream to arrive at one or more web site data network geographical 

locations, 

25, (Currently amended) The apparatus of claim [[23]] 15, wherein die comparison engine compares 
at least one of the first statistical distribution and the second statistical distribution to a statistical 
distribution of a customer database by: 

generating a composite data set from the training data set and the testing data set; and 
generating a composite statistical distribution from the composite data set that was generated 
from the training data set and the testing data set. 

26, (Previously presented) The apparatus of claim 15, wherein the comparison engine modifies 
selection of entries in one or more of the training data set and the testing data set by changing one of a 
random selection algorithm and a seed value for the random selection algorithm, and then re-comparing 
the first statistical distribution and the second statistical distribution. 

27, (Previously presented) The apparatus of claim 1 5, wherein the predictive algorithm device is 
trained using at least one of the training data set and the testing data set if the discrepancy is within a 
predetermined tolerance. 

28* (Original) The apparatus of claim 27. wherein the predictive algorithm is a discovery based data 
mining algorithm. 

29. (Currently amended) A computer program product in a computer readable medium comprising a 
data structure instructions for enabling a data processing machine to select data sets for use with a 
predictive algorithm based on data, network geographical information, comprising: 

first instructions for generating a first statistical distribution of a training data set; 

second instructions for generating a second statistical distribution of a testing data set; 

third instructions for oomparing using the first statistical distribution and the second statistical 
distribution to identify a discrepancy between the first statistical distribution and the second statistical 
distribution with respect to Ihs data network geographical information by comparing at least on? ?f the 
first statistical distribution and the second statistical distributi on to a statistical distribution of a customer 
database to determine if at least one of the traininE data set and the testing data set are geographically 
representative of a customer population represe nted bv the customer database; 
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fourth instructions for modifying selection of entries in one.or more of the training data set and 
the testing data set based on the discrepancy between the fust statistical distribution and the second 

statistical distribution; and 

fifth instructions for using the modified selection of entries by the predictive algorithm. 

30. (Previously presented) The. computer program product of claim 29. wherein the first statistical 
distribution and the second statistical distribution are distributions of a number of data network links from 
a customer data network geographical location to a web site data network geographical location. 

3 1 . (Previously presented) The computer program product of claim 29. wherein the first statistical 
distribution and the second statistical distribution are distributions of a size of a click stream to arrive at a 
web site data network geographical location, 

32. (Previously presented) The computer program product of claim 29, wherein the third instructions 
for comparing the first statistical distribution and the second statistical distribution include instructions for 
comparing one or more of a mean, mode, and standard deviation of the first statistical distribution to one 
or more of a mean, mode, and standard deviation of the second statistical distribution. 

33. (Previously presented) The computer program product of claim 29, wherein the first statistical 
distribution and the second statistical distribution are distributions of a weighted number of data network 
links between a customer data network geographical location and a web site data network geographical 
location. 

34. (Previously presented) The computer program product of claim 29, wherein the first statistical 
distribution and the second statistical distribution are distributions of a weighted size of a click stream to 
arrive at a web site data network geographical location. 

35. (Previously presented) The computer program product of claim 29, wherein the fourth 
instructions for modifying selection of entries in one or more of the training data set and the testing data 
set include instructions for generating recornrnendations for improving selection of entries in one or more 
of the training data set and the testing data set. and wherein the computer program product claim 29 
further comprises instructions for re-generating at least one of the first statistical distribution and the 
second statistical distribution based upon the recommendations, 
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36. (Cancelled) 

37. (Previously presented) The computer program product of claim 29, wherein the first statistical 
distribution and second statistical distribution are frequency distributions of a number of data network 
links between a customer data network geographical location and one or more web site data network 
geographical locations, and a size of a click stream to arrive at one or more web site data network 
geographical locations. 

38. (Currently amended) The computer program product of claim [[36]) 29. wherein the fifth 
instructions include; 

instructions for generating a composite data set from the training data set and the testing data set; 

and 

instructions for generating a composite distribution from the composite data set that was 
generated from the training data set and the testing data set. 

39. (Previously presented) The computer program product of claim 29, wherein the fourth 
instructions for modifying selection of entries in one or more of the training data set and the testing data 
set include instructions for changing one of a random selection algorithm and a seed value for the random 
selection algorithm, and then re-comparing the first statistical distribution and the second statistical 
distribution, 

40. (Previously presented) The computer program product of claim 29, wherein the fifth instructions 
include instructions for training the predictive algorithm using at least one of the training data set and the 
testing data set if the discrepancy is within a predetermined tolerance. 

41 . (Currently amended) A data processing machine implemented method of predicting customer 
behavior based on data network geographical influences, comprising data processing machine 
implemented steps of: 

obtaining data network geographical information regarding a plurality of customers , the data 
network ge ogra phic information comp r ising frequency distributions of both (i) number of data , network ; 
links between a customer gp nffra phical location and one or more web site data network K epgraphical 
locations, and fin size of a click stream for arriving at the one or more web site data network geographical 
locations : 
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training a predictive algorithm using the data network geographical information; and 
using the predictive algorithm to predict customer behavior based on the data network 
geographical information, 

42. (Currently amended) An apparatus for predicting customer behavior based on data network 

geographical influences, comprising: 

means for obtaining data network geographical information regarding a plurality of customers, 
the data network ^-o^nhic fafo nrflgp cnmnriring frequency distributions of bpft 0) number of dftffl 
pptwork links between a cust o mer geographical location and one or more yen site dfltt nflwork , 
p »n r „ r hlca1 loittHon*. and OS) rf - «f « clieV stream for arriving at the one Pf more , web site data netwqi k 
geographical locations ; 

means for training a predictive algorithm using the data network geographical information; and 
means for using the predictive algorithm to predict customer behavior based on the data network 
geographical information. 

43. (Currently amended) A computer program product in a computer readable medium comprising a 
data Qtruoturo instructions for enabling a data processing machine to predict customer behavior based.on 
data network geographical influences, comprising: 

first instructions for obtaining data network geographical information regarding a plurality of 
vomer s the data network geographic i nformation comnrisine frequency distributions of both (i> 
number of data network links between a customer g e ographical iocalion and one or more web site dm 
network geographical locations, and fin size of a clic k stream for arriving at the one or more web site data 
network geo graphical locations: 

second instructions for training a predictive algorithm using the data network geographical 
information; and 

third instructions for using the predictive algorithm to predict customer behavior based on the 
data network geographical information. 
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